From: "Richard D. Stiennon" <richard@fe3.rust.net> Message-Id: <9501170644.ZM19531@Fe3.rust.net> Date: Tue, 17 Jan 1995 06:44:52 -0400 Subject: (Fwd) WWW Servers on SOLARIS Bandwidth flood on Internet I thought the following summary of a SLIP/PPP caused problem would be of interest to other ISPs. It was discovered this past weekend by contributers to the portmaster-users mail list. -Richard Stiennon RustNet --- Forwarded mail from Ed Goldgehn <edg@OCN.Com> To: Portmaster Users Group <portmaster-users@msen.com> It has come to the attention of a few members of Livingston Enterprises PortMasters Users Group on Internet that a kernel bug in Solaris 2.X is causing an unidentified (but potentially significant) number of unnecessary data packets to be placed on Internet by WWW servers. There has been an unconfirmed report that a similar error exists on SGI machines as well. The nature of the kernel bug is most often exposed with httpd daemons (WWW Servers) when the Solaris kernel does not recognize or receive a session disconnect when a remote user terminates their session with the WWW server. When this occurs, the remote user's session stays forever active on the WWW server and will continuously send data packets out over the Internet. Under certain circumstances (yet to be identified), a session state of CLOSE-WAIT with a non-zero Send-queue and Send Window and can exist. This specific state currently results in a looped state that sends out 1396 byte packets over the Internet every *56 seconds* (after max exponential back-off and based on the value of the parameter tcp_rexmit_interval_max). This state will continue to exist until the WWW server is reset or it receives an RST from the client. According to interpretation of tcp code, it appears that a window close on the side of the PC will cause the connection never to time out. Solaris 2.x nullifies the window close and reopens it to do a window probe. This situation has been observed on both the latest and earlier OS revisions. ************************************************************************* This condition is likely to result from any standard dial-up SLIP or PPP account available from any Internet Service Provider (ISP). This condition is likely to impact the bandwidth availability of any ISP regardless of what machines or OS the ISP uses. The more dial-ip SLIP/PPP users an ISP has, the more likely they are to be effected by this problem. ************************************************************************* Since the data packets are initiated at the Server end of the Internet, and the packets are being sent to disconnected client sessions to any other end of the Internet, nearly all ISP's are likely to be receiving incoming packets on their backend connections to the net which are simply using up bandwidth. The destination of these packets are most likely previously disconnected IP addresses from dial-in users. Initial analysis is that this condition exists when a combination of an ungraceful client disconnect is followed by a client dropping of the net where no RST is sent by the client. This may also be related to the use of _broken_ protocol (IP) stacks on personal computers by any number of dial-up users that do not send a RST (these appear to be the most common). An indicatation of the possible extent of this situation on the whole of Internet is that this problem was originally identified with Chameleon 4.00 running on a PC using Netscape 1.0N as the client software. Further investigation into this situation to identify more specifically what conditions allow this problem to occur are taking place. If you want additional information, or wish to provide additional technical input, a majordomo list has been established. To subscribe, send a mail message to: majordomo@destek.net with "subscribe solwww-bug" in the body of the message. To submit something for reflection on the list, send mail to solwww-bug@destek.net. --------------------------------------------------------------------------- Thanks to the following individuals for their part in identifying and determining this problem: Paul Lind <paul@cruz.com> - first identified the problem and reported it to the PortMaster Users Group Guido van Rooij <guido@iaehv.nl> - traced the problem down to a TCP/IP stack related issue that got the ball rolling Cor Bosman <cor@xs4all.net> - traced the problem down to Solaris Casper Dik <casper@fwi.uva.nl> - the Solaris expert that identified the specific socket state and other Solaris specific technical information that formed the basis of this post ---------------------------------------------------------------------------- Casper has provided the following command to change the kernel parameter to 600 second intervals. Mathematically, this setting will reduce retransmits of this nature by 90%. (ndd -set /dev/tcp tcp_rexmit_interval_max 600000) The 600000 parameter is in milliseconds. ---------------------------------------------------------------------------- Also, thanks to Marc Evans <marc@destek.net> for setting up the listserv to help all of us communicate effectively about this problem. ************************************************************************** Ed Goldgehn E-Mail: edg@ocn.com Sr. Vice President Voice: (404) 919-1561 Open Communication Networks, Inc. Fax: (404) 919-1568 ************************************************************************** --- End of forwarded mail from Ed Goldgehn <edg@OCN.Com>